NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RayZer: A Self-supervised Large View Synthesis Model

Jiang, Hanwen; Tan, Hao; Wang, Peng; Jin, Haian; Zhao, Yue; Bi, Sai; Zhang, Kai; Luan, Fujun; Sunkavalli, Kalyan; Huang, Qixing; et al (October 2025, IEEE/CVF, International Conference on Computer Vision)

Free, publicly-accessible full text available October 15, 2026
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Pang, Ziqi; Zhang, Tianyuan; Luan, Fujun; Man, Yunze; Tan, Hao; Zhang, Kai; Freeman, William T; Wang, Yu-Xiong (June 2025, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Free, publicly-accessible full text available June 11, 2026
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Jin, Haian; Jiang, Hanwen; Tan, Hao; Zhang, Kai; Bi, Sai; Zhang, Tianyuan; Luan, Fujun; Snavely, Noah; Xu, Zexiang (April 2025, International Conference on Learning Representations (ICLR))

We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens, functioning as a fully learned scene representation, and decodes novel-view images from them; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs, completely eliminating intermediate scene representations. Both models bypass the 3D inductive biases used in previous methods—from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps)—addressing novel view synthesis with a fully data-driven approach. While the encoder-decoder model offers faster inference due to its independent latent representation, the decoder-only LVSM achieves superior quality, scalability, and zero-shot generalization, outperforming previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. Comprehensive evaluations across multiple datasets demonstrate that both LVSM variants achieve state-of-the-art novel view synthesis quality. Notably, our models surpass all previous methods even with reduced computational resources (1-2 GPUs).
more » « less
Free, publicly-accessible full text available April 24, 2026
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

Jiang, Hanwen; Xu, Zexiang; Xie, Desai; Chen, Ziwen; Jin, Haian; Luan, Fujun; Shu, Zhixin; Zhang, Kai; Bi, Sai; Sun, Xin; et al (June 2025, IEEE/CVF International Conference on Computer Vision)

Free, publicly-accessible full text available June 1, 2026
Editorial: Morels: physiology, genetics, and interactions with the environment

https://doi.org/10.3389/fmicb.2023.1352719

Tan, Hao; Du, Xi-Hui; Bonito, Gregory; Masaphy, Segula (January 2024, Frontiers in Microbiology)

Full Text Available
SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

Cao, Shengcao; Gu, Jiuxiang; Kuen, Jason; Tan, Hao; Zhang, Ruiyi; Zhao, Handong; Nenkova, Ani; Gui, Liangyan; Sun, Tong; Wang, Yu-Xiong (May 2024, International Conference on Learning Representations (ICLR))

Full Text Available
A polymeric hydrogel electrocatalyst for direct water oxidation

https://doi.org/10.1038/s41467-023-36532-x

Pei, Zengxia; Tan, Hao; Gu, Jinxing; Lu, Linguo; Zeng, Xin; Zhang, Tianqi; Wang, Cheng; Ding, Luyao; Cullen, Patrick J.; Chen, Zhongfang; et al (December 2023, Nature Communications)

Abstract Metal-free electrocatalysts represent a main branch of active materials for oxygen evolution reaction (OER), but they excessively rely on functionalized conjugated carbon materials, which substantially restricts the screening of potential efficient carbonaceous electrocatalysts. Herein, we demonstrate that a mesostructured polyacrylate hydrogel can afford an unexpected and exceptional OER activity – on par with that of benchmark IrO 2 catalyst in alkaline electrolyte, together with a high durability and good adaptability in various pH environments. Combined theoretical and electrokinetic studies reveal that the positively charged carbon atoms within the carboxylate units are intrinsically active toward OER, and spectroscopic operando characterizations also identify the fingerprint superoxide intermediate generated on the polymeric hydrogel backbone. This work expands the scope of metal-free materials for OER by providing a new class of polymeric hydrogel electrocatalysts with huge extension potentials.
more » « less
Full Text Available
Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Lai, Viet; Salinas, Abel; Tan, Hao; Bui, Trung; Tran, Quan; Yoon, Seunghyun; Deilamsalehy, Hanieh; Dernoncourt, Franck; Nguyen, Thien (August 2023, Proceedings of INTERSPEECH)
Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

https://doi.org/10.21437/Interspeech.2023-1468

Lai, Viet Dac; Salinas, Abel; Tan, Hao; Bui, Trung; Tran, Quan; Yoon, Seunghyun; Deilamsalehy, Hanieh; Dernoncourt, Franck; Nguyen, Thien Huu (August 2023, Proceedings of the INTERSPEECH Conference)
ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

https://doi.org/10.18653/v1/2020.findings-emnlp.348

Kim, Hyounghun; Zala, Abhaysinh; Burri, Graham; Tan, Hao; Bansal, Mohit (November 2020, Findings of the Association for Computational Linguistics: EMNLP 2020)
null (Ed.)
For embodied agents, navigation is an important ability but not an isolated goal. Agents are also expected to perform specific tasks after reaching the target location, such as picking up objects and assembling them into a particular arrangement. We combine Vision-andLanguage Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ARRAMON. During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language (English) instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment. To support this task, we implement a 3D dynamic environment simulator and collect a dataset with human-written navigation and assembling instructions, and the corresponding ground truth trajectories. We also filter the collected instructions via a verification stage, leading to a total of 7.7K task instances (30.8K instructions and paths). We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work.
more » « less
Full Text Available

« Prev Next »

Search for: All records